Ultra-Short-Term Wind Power Forecasting Based on CGAN-CNN-LSTM Model Supported by Lidar

Accurate prediction of wind power is of great significance to the stable operation of the power system and the vigorous development of the wind power industry. In order to further improve the accuracy of ultra-short-term wind power forecasting, an ultra-short-term wind power forecasting method based on the CGAN-CNN-LSTM algorithm is proposed. Firstly, the conditional generative adversarial network (CGAN) is used to fill in the missing segments of the data set. Then, the convolutional neural network (CNN) is used to extract the eigenvalues of the data, combined with the long short-term memory network (LSTM) to jointly construct a feature extraction module, and add an attention mechanism after the LSTM to assign weights to features, accelerate model convergence, and construct an ultra-short-term wind power forecasting model combined with the CGAN-CNN-LSTM. Finally, the position and function of each sensor in the Sole du Moulin Vieux wind farm in France is introduced. Then, using the sensor observation data of the wind farm as a test set, the CGAN-CNN-LSTM model was compared with the CNN-LSTM, LSTM, and SVM to verify the feasibility. At the same time, in order to prove the universality of this model and the ability of the CGAN, the model of the CNN-LSTM combined with the linear interpolation method is used for a controlled experiment with a data set of a wind farm in China. The final test results prove that the CGAN-CNN-LSTM model is not only more accurate in prediction results, but also applicable to a wide range of regions and has good value for the development of wind power.


Introduction
In recent years, as traditional non-renewable energy sources have been increasingly exhausted due to continuous utilization, emerging renewable clean energy sources have flourished under the strong support of the state, and wind power has developed particularly rapidly, with installed capacity increasing year by year. By the end of September 2022, the cumulative installed capacity of wind power in the country was 348 million kilowatts, an increase of 17% year-on-year, of which the newly installed wind power installed capacity reached 5.339 million kilowatts, an increase of 0.72% year-on-year, and the power generation increased steadily. During the first three quarters of 2022, the national wind power generation reached 158.1 billion kilowatt-hours, an increase of 26.5% year-on-year, accounting for 13.8% of the total power generation, and has become an important part of the national energy [1]. However, as the proportion of wind power continues to increase, the randomness and volatility of wind power generation can easily lead to insufficient or redundant output, the large-scale integration of wind power into the power grid will greatly increase the difficulty of making power generation plans, and the frequency and voltage of the power grid will fluctuate, which will have a negative impact on the internal operation of the power system [2], thereby restricting the developmental scale of wind power. Through

•
Proposed an ultra-short-term wind power forecasting model based on the CGAN-CNN-LSTM algorithm and verified its feasibility; • Used GAN's data supplement function to solve the problem of missing data in the original data set; • The time scale of ultra-short-term wind power prediction is shortened to 5 min, which improves the prediction accuracy.
The rest of the paper is organized as follows. Section 2 mainly introduces the basic principles of each algorithm used in this model, including formulas and structural diagrams. In Section 3, the various algorithms mentioned above are combined into a complete prediction model, and the data supplement function of the model for incomplete data sets is emphasized, and then the operation of the CGAN-CNN-LSTM wind power prediction model is introduced in detailed steps. Section 4 is the calculation example verification. First, we introduced the layout of the Sole du Moulin Vieux wind farm in France, especially the sensors scattered around the wind turbines. Then, using the data set obtained by lidar, the four months of February, May, August, and November were selected as representatives of the four seasons of the year. We selected four machines in the wind farm for one month as test targets, comparing this model with the CNN-LSTM wind power forecasting model for multiple rounds. It was verified that the forecasting accuracy of this model is better through the line chart and evaluation function. Finally, Section 5 summarizes the full text, discusses the results, and explains the research significance of this model.

Introduction to Algorithm Theory
In order to facilitate the understanding of professional words and special words, this paper includes an abbreviation table, as shown in Table 1.

Generative Adversarial Network
The generative adversarial network is a deep learning model [24], which produces a fairly good output in mutual game learning through two modules in the framework: the generative model and the discriminative model [25,26]. The random noise z is introduced into the generator, and the distribution of the mapping function is recorded as Pg(z). The discriminator receives the data from the generator Pz and the data of the real data Pdata at the same time, and outputs 0 or 1 to distinguish the generator. Whether the sample is true or not, the generator needs to make the output distribution converge to the real data distribution, and fool the discriminator, and the two adopt a confrontational method to improve each other's performance. The process is shown in Figure 1: into the generator, and the distribution of the mapping function is recorded as Pg(z). The discriminator receives the data from the generator Pz and the data of the real data Pdata at the same time, and outputs 0 or 1 to distinguish the generator. Whether the sample is true or not, the generator needs to make the output distribution converge to the real data distribution, and fool the discriminator, and the two adopt a confrontational method to improve each other's performance. The process is shown in Figure 1: The objective function of the discriminator D is: In the formula: represents the value of the real data and represents the value of the data generated by the generator. The larger the sum of the two results, the better the effect of the discriminator.
The objective function of the generator G is: is a generated data distribution, the total objective function can be changed to: In the formula: is to maximize the value function of the discriminator.
The discriminator outputs 1 for real data and 0 for fake data. G min is the value function to minimize the discriminator device; that is, it is hoped that ) (X D is close to 0, and the result indicates that the generated data distribution is close to the real data distribution. Since the image generated by the original GAN algorithm is random and unpredictable, the generated target is not clear, the controllability is not strong, and there are certain limitations. This paper chooses to use the conditional generative adversarial network (CGAN) [27,28]. The method of the CGAN is constructed by adding label(y) to the data, and additional conditional information is added to the input of the discriminator and the generator, which constrains the data generated by the generator [29]. Only data that is sufficiently real and qualified can be recognized by the discriminator. The objective function of the CGAN is: In the formula: represent the condition y introduced by the CGAN, and the characteristic distribution of the sample X is calculated when the condition y is known. The objective function of the discriminator D is: In the formula: E X∼Pdata(X) [logD(X)] represents the value of the real data and represents the value of the data generated by the generator. The larger the sum of the two results, the better the effect of the discriminator.
The objective function of the generator G is: min Since G(z) is a generated data distribution, the total objective function can be changed to: In the formula: max D V (D, G) is to maximize the value function of the discriminator. The discriminator outputs 1 for real data and 0 for fake data. min G is the value function to minimize the discriminator device; that is, it is hoped that D(X) is close to 0, and the result indicates that the generated data distribution is close to the real data distribution.
Since the image generated by the original GAN algorithm is random and unpredictable, the generated target is not clear, the controllability is not strong, and there are certain limitations. This paper chooses to use the conditional generative adversarial network (CGAN) [27,28]. The method of the CGAN is constructed by adding label(y) to the data, and additional conditional information is added to the input of the discriminator and the generator, which constrains the data generated by the generator [29]. Only data that is sufficiently real and qualified can be recognized by the discriminator. The objective function of the CGAN is: In the formula: D(X|y) and G(z|y) represent the condition y introduced by the CGAN, and the characteristic distribution of the sample X is calculated when the condition y is known.

Convolutional Neural Network
The CNN is a feed-forward neural network with a convolutional structure, which consists of an input layer, a convolutional layer, a pooling layer, and a fully connected layer [30,31]. It has a wide range of applications in image recognition, natural language processing, and remote sensing science. Compared with the traditional multi-layer neural network, the CNN adds a convolutional layer and a pooling layer [32] to the fully connected layer, which is more effective in extracting the feature learning part. The formula for the feature extraction of a one-dimensional convolution for the time series is: In the formula: Y is the extracted feature; σ is the sigmoid activation function; W is the weight matrix; T is the time series; b is the bias vector.

Attention-LSTM Network
LSTM is an efficient RNN architecture, which overcomes the problems of gradient disappearance and gradient explosion produced by RNN networks when dealing with long-term dependency problems [33]. The core concept of LSTM is cell state and "gate" structure, and each LSTM unit is composed of cell state, forget gate, input gate, and output gate. The LSTM structure is shown in Figure 2. nected layer, which is more effective in extracting the feature learning part. The formula for the feature extraction of a one-dimensional convolution for the time series is: In the formula: Υ is the extracted feature; σ is the sigmoid activation function; W is the weight matrix; T is the time series; b is the bias vector.

Attention-LSTM Network
LSTM is an efficient RNN architecture, which overcomes the problems of gradient disappearance and gradient explosion produced by RNN networks when dealing with long-term dependency problems [33]. The core concept of LSTM is cell state and "gate" structure, and each LSTM unit is composed of cell state, forget gate, input gate, and output gate. The LSTM structure is shown in Figure 2.
In the formula: σ represents the sigmoid function; W and b are the parameters of the training network. The forget gate reads the previous output x , and then processes the sigmoid function to obtain the output t f . The output value is between 0 and 1. If it is close to 0, it will be deleted, and if it is close to 1, it will be retained. The input gate is to determine what kind of new information is stored in the cell state, which consists of two steps, and its formula is: Set the input time series as x = {x 1 , x 2 , ..., x t }, and the two output sequences after LSTM mapping are h = {h 1 , h 2 , ..., h t } and y = {y 1 , y 2 , ..., y t }. The forget gate in the LSTM unit decides which information should be discarded or retained, and its formula is: In the formula: σ represents the sigmoid function; W and b are the parameters of the training network. The forget gate reads the previous output h t−1 and the current output x t , and then processes the sigmoid function to obtain the output f t . The output value is between 0 and 1. If it is close to 0, it will be deleted, and if it is close to 1, it will be retained.
The input gate is to determine what kind of new information is stored in the cell state, which consists of two steps, and its formula is: In the formula: ∼ C t represents the new vector created by the tanh layer. The input gate obtains the data processed by the two functions of sigmoid and tanh, respectively, and combines the two into the cell state.
The cell state is to update C t−1 to C t ; the formula is: The output gate determines what value needs to be output in the end; the formula is: It can be seen from the formula that the input data processed by the sigmoid function is multiplied by the cell state data processed by the tanh function, and the final data obtained is the output part.
The attention mechanism is derived from the study of human vision. Humans are used to selecting key parts of all information to remember, while forgetting other information [34]. For this prediction model, in order to make it focus more on the key information in the sequence, this paper adopts an attention-LSTM model, as shown in Figure 3.
The output gate determines what value needs to be output in the end; the formula is: It can be seen from the formula that the input data processed by the sigmoid function is multiplied by the cell state data processed by the tanh function, and the final data obtained is the output part.
The attention mechanism is derived from the study of human vision. Humans are used to selecting key parts of all information to remember, while forgetting other information [34]. For this prediction model, in order to make it focus more on the key information in the sequence, this paper adopts an attention-LSTM model, as shown in Figure 3.  After adding the attention mechanism, the output of each step of LSTM is calculated simultaneously with the current output, and finally the so f tmax function is used to generate a probability value. In Figure 3, after the input sequence x 1 , x 2 , x 3 , ..., x t passes through the LSTM unit, the output sequence s 1 , s 2 , s 3 , ..., s t is obtained. Wk i is the attention weight of each feature, and u is the feature representation. After attention processing, the output sequence y 1 , y 2 , y 3 , ..., y t is obtained. The formula of the attention mechanism is as follows: In the formula: B i is the probability distribution value of each attention, Hk i is the attention mechanism matrix, and b is the bias. Finally, use the so f tmax function to obtain the predicted data y t ; the formula is as follows: Figure 4 is the structural diagram of the CGAN-CNN-LSTM wind power prediction model, which is divided into two stages: data processing and power prediction.  Figure 4 is the structural diagram of the CGAN-CNN-LSTM wind power prediction model, which is divided into two stages: data processing and power prediction. In the data processing stage, firstly, random samples enter the generator to generate data and are then input into the discriminator at the same time as the real samples. The discriminator is responsible for judging the authenticity of the two sets of data and uses the output to reversely update the generator and the discriminator. Under the continuous confrontation and updating of the generator and the discriminator, the data from the generator are finally very close to the data distribution of the real sample, and the discriminator cannot distinguish the real sample from the generated data. Finally, use the trained CGAN network to fill in the data [35].

CGAN-CNN-LSTM Prediction Model
In the power prediction stage, the CNN-LSTM model is selected, and the CNN local feature extraction module is a Conv1D layer, the number of convolution kernels is 64, and the size is 4; Followed by a batch normalization layer and a Maxpool1D layer. In LSTM, an attention module is added. The complete data are first input into the CNN network; then perform a local feature extraction and normalize the data. The formula is as follows:  In the data processing stage, firstly, random samples enter the generator to generate data and are then input into the discriminator at the same time as the real samples. The discriminator is responsible for judging the authenticity of the two sets of data and uses the output to reversely update the generator and the discriminator. Under the continuous confrontation and updating of the generator and the discriminator, the data from the generator are finally very close to the data distribution of the real sample, and the discriminator cannot distinguish the real sample from the generated data. Finally, use the trained CGAN network to fill in the data [35].
In the power prediction stage, the CNN-LSTM model is selected, and the CNN local feature extraction module is a Conv1D layer, the number of convolution kernels is 64, and the size is 4; Followed by a batch normalization layer and a Maxpool1D layer. In LSTM, an attention module is added. The complete data are first input into the CNN network; then perform a local feature extraction and normalize the data. The formula is as follows: y nor = y − y min y max − y min (17) In the formula: x nor represents the normalized data of meteorological characteristics; x max and x min represent the maximum and minimum values of meteorological characteristics; y nor represents the normalized data of the original wind power; y represents the original wind power data; y max and y min represent the maximum and minimum values of the wind power data.
Put the normalized data into convolutional and pooling layers for local feature extraction and stacking, and then the data with extracted feature information are predicted in the LSTM. Finally, in order to ensure that the data have physical meaning, it is necessary to denormalize the prediction results after the prediction is completed. The formula is as follows: y * dnor = (y max − y min )y * + y min (18) In the formula: y * dnor represents the dimensioned wind power sequence after denormalization; y * represents the predicted value of the wind power.

Missing Value Supplementation
When processing the data, it was found that due to various factors such as the failure of the data collection equipment, weather factors, or collection errors, there were obvious data missing, so we chose to use the trained CGAN network to fill in the missing data. The formula is as follows: In the formula: − X represents the filling value; ∼ X represents the data vector containing the missing values; M represents the mask vector, which only takes a value between 0 and 1, and is used to prompt the position of the missing data;X represents the final complete data vector. The data in − X are used for places with missing data, and the data in ∼ X are used for places without missing data.
The process of filling is to first remove the bad data in the original data set according to the missing rate and obtain the data vector complete data set. Finally, inputX and a hint vector, Hint, into the discriminator. After the same information feedback and confrontation optimization between the generator and the discriminator as before, the optimal solution is obtained, and a complete data set is output.
The function of Hint is to carry out certain interference, to prevent the trained CGAN from being unable to continue training, and to speed up convergence at the same time. The principle of missing value supplementation is shown in Figure 5.

Wind Power Forecasting Process
Based on the three neural networks of CGAN, CNN, and LSTM, this paper constructs a combined model of the CGAN for data screening and supplementation and the CNN-LSTM for prediction, and adds an attention mechanism to the LSTM to accelerate the model convergence and improve wind power prediction accuracy. The prediction process is specifically divided into the following steps.
1. By importing the data set, it can be seen that a large section of data is missing. Through the CGAN, the data set is interpolated and filled to form a complete data set; 2. For the convenience of model calculation, the data are normalized; in order to make the data have physical meaning, the predicted results need to be denormalized;

Wind Power Forecasting Process
Based on the three neural networks of CGAN, CNN, and LSTM, this paper constructs a combined model of the CGAN for data screening and supplementation and the CNN-LSTM for prediction, and adds an attention mechanism to the LSTM to accelerate the model convergence and improve wind power prediction accuracy. The prediction process is specifically divided into the following steps.

1.
By importing the data set, it can be seen that a large section of data is missing. Through the CGAN, the data set is interpolated and filled to form a complete data set; 2.
For the convenience of model calculation, the data are normalized; in order to make the data have physical meaning, the predicted results need to be denormalized; 3.
Draw a heatmap to clearly see the correlation between the characteristic value and the wind power [36]; 4.
The data set is divided into a test set and a training set, and the output set is composed of time steps, respectively. After exchanging the rows and columns, the attention mechanism is applied to assign dynamic weights to the feature values; 5.
By repeatedly training the model and comparing it with the test set, determine the group with the best evaluation function, and complete the establishment of the prediction model.

Lidar Wind Power Data Collection
At present, most wind farms use the method of building an anemometer tower to observe the wind conditions at the site continuously around the clock, and then record and store the measurement data in the data recorder installed on the tower body. However, the anemometer tower has disadvantages, such as many requirements for site selection, difficult maintenance, and high cost, which bring huge investment risks and loss of income to the construction, operation, and maintenance of wind farms. Lidar has the advantages of light and portable, easy installation, simple operation, and accurate wind measurement results. It is currently widely used in the field of wind power at home and abroad.
Wind Iris is the first wind turbine nacelle wind lidar developed by the French company Leosphere (Paris, France). It can measure the wind speed and wind direction in the range of 40~400 m directly in front of the hub of the wind turbine. Real-time data and statistics can be automatically transmitted via the data protocol or stored on the device itself. Wind Iris emits two laser beams at the same time to measure the wind speed at the hub height in front of the unit, and the wind speed measured by the two laser beams is processed to obtain the actual wind speed at the hub height at the measured position. However, Wind Iris cannot measure data such as wind direction, air pressure, temperature, and humidity.
The WindCube V2 land-based lidar is also developed by Leosphere (Paris, France). The wind speed, wind direction, and other indicators are measured by measuring the frequency change in the moving speed of the aerosol in the air by the pulsed laser. Measuring equipment such as air pressure, temperature, and humidity are embedded in the lidar wind measurement system. The WindCube V2 lidar emits four laser beams at the same time, measures the wind speed at four points within each layer height, performs weighted processing, and obtains the average wind speed and wind direction of the layer height.
Both lidars use laser pulse Doppler. The principle of frequency shift, by measuring the Doppler frequency shift generated by the aerosol backscatter echo signal; accurate real-time wind field data; and aerosol backscatter data are obtained to invert wind speed and wind direction information [37,38]. The specific working principle is as follows: Let the laser-emitting module and the receiving module be used as the inertial system S, and the measured object be used as the inertial system A; the motion speed of the inertial system A relative to the inertial system S is V. When the laser source emits a beam of laser light with frequency f 0 to the measured object, in the inertial system A, the laser frequency at the measured object is: The emitted laser light is backscattered by aerosol particles, and part of the light is reflected back to the detector. In the inertial system S, the laser frequency at the receiving module is: Therefore, there is a frequency difference between the local oscillator light and the echo signal, and the frequency difference is the Doppler frequency shift, namely: The above Doppler frequency shift formula is applicable to the off-axis coherent wind lidar system. For the transceiver coaxial system, the laser transmitting module and the receiving module are the same telescope, α = β = θ, Formula (23) can be simplified as: Among them, λ is the laser wavelength, V is the moving speed of the measured object, and θ is the angle between the measured object and the emitting laser.

Data Sources
The data set used in this paper is the measured data of Sole du Moulin Vieux (SMV) wind farm in France provided by ENGIE Green [39,40]. The data include NWP data and real-time wind power data of seven wind turbines. The wind farm is equipped with an advanced laser radar system and adopts a coordinated control strategy based on the coefficient of power (CP) [41]. Lidar is an important sensor used in surveying and mapping, mainly including ranging, positioning, and three-dimensional rendering of surface objects. In this wind farm, it is mainly used to collect data information such as wind speed, wind direction, and temperature. All turbines were equipped with a supervisory control and data acquisition (SCADA) system allowing 1 Hz data for the most critical variables to be recorded. SMV6 is equipped with Orion 5-beam laser radar for free-flow wind. A Vaisala Triton sodar was installed in the proximity of turbines SMV5 and SMV6, and a Leosphere (Paris, France) Windcube V1 ground lidar was installed between the SMV2 and SMV3 [42]. The radar measured the wind speed frequency at a height of 40 m to 200 m at a frequency of 1 Hz. Although data from the sodar and profiling lidar were not used extensively during the analysis, they were used to cross-check and validate wind measurements from the turbines and to identify the best references for assessing the ambient wind conditions. The scanning lidar was installed on the east side, 1.2 km away from the wind farm, and it can measure the hub height of SMV6 [43]. For the lidar-based analyses presented here, we used the estimated horizontal wind speeds and wind directions at hub height provided by the lidar.

Experimental Platform and Evaluation Criteria
Based on the Tensorflow 2.2 deep learning library in Python 3.7 software, a prediction model was built, and the Adma optimization algorithm was introduced to update the weight of the neural network in the form of data iteration; the number of iterations is 100. Finally, the evaluation functions root means square error (RMSE) and R-square (R 2 ) are used to judge the precision of the model; the formula is as follows.
Root means square error (RMSE) is the square root of the ratio of the square of the deviation between the predicted value and the true value and the number of observations n, which is a typical indicator of the regression model. The smaller the RMSE, the better the prediction model. R 2 is the ratio of 1 minus the sum of the squares of the distances between all observations and predicted values to the sum of squares of the distances between all observations and the mean. The closer the R 2 value is to 1, the better the prediction model is.

Experimental Results and Model Comparison
Since the CGAN-CNN-LSTM model is relatively advanced, it is meaningless to compare it with the BP neural network, Elman neural network, and other prediction models, so the CNN-LSTM, LSTM, and SVM prediction models were selected for comparative experiments. First, we imported all the data sets. From Figure 6, it can be seen that the data in the first part and the data in the fifth part are obviously missing. Figure 7 shows the effect after supplementing the data with the CGAN correlation, data supplements have been highlighted in different colors.
We selected February, May, August, and November from the one-year data set, 5 days per month as the training set, and 1 day as the test set. Fans 1, 3, 5, and 7 were selected as test units. The wind speed, wind direction, and temperature in the NWP data are used as the input data, and the output is the wind power. Figure 8 shows the influence of the wind speed, wind direction, and temperature on the wind power, among which wind speed has the greatest influence, and wind direction has the least influence. Figure 9 shows the test results in February, Figure 10 shows the test results in May, Figure 11 shows the test results in August, and Figure 12 shows the test results in November. In the simulation diagram, the 300 sampling points in the image are divided according to the time scale, and a sampling point every 5 min predicts the wind power of a day.
Sensors 2023, 23, x FOR PEER REVIEW 12 of 21 effect after supplementing the data with the CGAN correlation, data supplements have been highlighted in different colors. We selected February, May, August, and November from the one-year data set, 5 days per month as the training set, and 1 day as the test set. Fans 1, 3, 5, and 7 were selected as test units. The wind speed, wind direction, and temperature in the NWP data are used as the input data, and the output is the wind power. Figure 8 shows the influence of the wind speed, wind direction, and temperature on the wind power, among which wind speed has the greatest influence, and wind direction has the least influence. Figure 9 shows the test results in February, Figure 10 shows the test results in May, Figure 11 shows the test results in August, and Figure 12 shows the test results in November. In the simulation diagram, the 300 sampling points in the image are divided according to the time scale, and a sampling point every 5 min predicts the wind power of a day.   effect after supplementing the data with the CGAN correlation, data supplements have been highlighted in different colors. We selected February, May, August, and November from the one-year data set, 5 days per month as the training set, and 1 day as the test set. Fans 1, 3, 5, and 7 were selected as test units. The wind speed, wind direction, and temperature in the NWP data are used as the input data, and the output is the wind power. Figure 8 shows the influence of the wind speed, wind direction, and temperature on the wind power, among which wind speed has the greatest influence, and wind direction has the least influence. Figure 9 shows the test results in February, Figure 10 shows the test results in May, Figure 11 shows the test results in August, and Figure 12 shows the test results in November. In the simulation diagram, the 300 sampling points in the image are divided according to the time scale, and a sampling point every 5 min predicts the wind power of a day.        Table 2 shows the data of the two loss functions of R 2 and RMSE of the four forecasting models in February, and Tables 3-5 represent the same content in May, August, and September, respectively.  It can be seen from Table 2  It can be seen from Table 3  It can be seen from Table 4  It can be seen from Table 5  For the whole year, it can be seen from the figure that the forecasts for February and August are better than those for May and November. This is because February and August are windy months, with strong and relatively stable wind speeds, and less fluctuations in wind power power, which are easier to predict. It can also be seen from the figure that the fitting curves of SMV5 and SMV7 are obviously better than those of SMV1 and SMV3. This is because there are a small number of wind speed and temperature in the data sets of SMV1 and SMV3 units. Or the record of the wind direction is missing, which does not match the power value at the same time, resulting in some impact on the power prediction of the model. In order to show that the model is applicable to wind farms all over the world, and to verify the difference between CGAN and general interpolation methods, this experiment adds a set of control experiments in Chinese wind farms. The content of the experiment is to set the data of the wind farm from March 1st to 5th as the training set. The data on March 6 was set as the test set, and the CGAN-CNN-LSTM model and the CNN-LSTM with linear interpolation model (L-CNN-LSTM) were used to make predictions, and the test results of the four machines were compared, Figure 13 and Table 6 are the test results in March. Table 6 is the R 2 and RMSE evaluation functions of these two groups of models.
the average values of RMSE are 0.0774, 0.0885, 0.0903, and 0.0956, respectively. Compared with the best CNN-LSTM in the control experiment, CGAN-CNN-LSTM increased 2 R by 2.45% and RMSE decreased by 12.5%. It proves that this model is more accurate in predicting wind power.
In order to show that the model is applicable to wind farms all over the world, and to verify the difference between CGAN and general interpolation methods, this experiment adds a set of control experiments in Chinese wind farms. The content of the experiment is to set the data of the wind farm from March 1st to 5th as the training set. The data on March 6 was set as the test set, and the CGAN-CNN-LSTM model and the CNN-LSTM with linear interpolation model (L-CNN-LSTM) were used to make predictions, and the test results of the four machines were compared, Figure 13 and Table 6 are the test results in March. Table 6 is the 2 R and RMSE evaluation functions of these two groups of models.  In experiments in wind farms in China, the final results show that the average R 2 value of the CGAN-CNN-LSTM prediction model is 0.927, and the R 2 average value of the L-CNN-LSTM prediction model is 0.887; The average RMSE value of the CGAN-CNN-LSTM prediction is 0.0812, and the average RMSE value of the L-CNN-LSTM prediction model is 0.0926. It can be seen that compared with the L-CNN-LSTM prediction model, the R 2 value of this model increases by 4.5% on average, and the RMSE value decreases by 12.3% on average, which is closer to the actual wind power curve.

Conclusions
In response to the increasing accuracy requirements of wind power forecasting, this paper proposes a combined forecasting model of CGAN-CNN-LSTM, select 4 typical monthly data of 4 units in the French wind farm as the data set for testing, and choose CNN-LSTM, LSTM, SVM as the comparison algorithm. The test result is compared with the best CNN-LSTM in the control experiment, CGAN-CNN-LSTM increased R 2 by 2.45% and RMSE decreased by 12.5%. It proves that this model is more accurate in predicting wind power. Then, in order to prove the universality of this model and the ability of the CGAN algorithm, a wind farm in China was selected as a data set and compared with L-CNN-LSTM. The results show that the R 2 value of this model increases by 4.5% on average, and the RMSE value decreases by 12.3% on average, which is closer to the actual wind power curve. It also shows that the model is applicable to different wind farms around the world. The main features of this model are as follows: 1.
Use CGAN to fill in the missing data of the NWP dataset to obtain a complete dataset.

2.
Use CNN to extract features from the data set, and then use LSTM algorithm to predict wind power.

3.
The Attention mechanism is added to the LSTM algorithm to make the model pay more attention to the key information in the sequence, speed up the convergence speed, and improve the model accuracy.
It solves the problem of partial missing of the original data set, and provides new ideas and methods for improving the accuracy of ultra-short-term wind power prediction. However, this model also has shortcomings. When filling in the missing data, it will be difficult to simulate real data if faced with data with large differences between before and after. In the GAN series, more excellent variants will be developed in the future, and models based on them will make wind power prediction more accurate.